Theoretical studies on RNA recognition by Musashi 1 RNA-binding protein

The Musashi (MSI) family of RNA-binding proteins, comprising the two homologs Musashi-1 (MSI1) and Musashi-2 (MSI2), typically regulates translation and is involved in cell proliferation and tumorigenesis. MSI proteins contain two ribonucleoprotein-like RNA-binding domains, RBD1 and RBD2, that bind single-stranded RNA motifs with a central UAG trinucleotide with high affinity and specificity. The finding that MSI also promotes the replication of Zika virus, a neurotropic Flavivirus, has triggered further investigations of the biochemical principles behind MSI–RNA interactions. However, a detailed molecular understanding of the specificity of MSI RBD1/2 interaction with RNA is still missing. Here, we performed computational studies of MSI1–RNA association complexes, investigating different RNA pentamer motifs using molecular dynamics simulations with binding free energy calculations based on the solvated interaction energy method. Simulations with Alphafold2 suggest that predicted MSI protein structures are highly similar to experimentally determined structures. The binding free energies show that two out of four RNA pentamers exhibit a considerably higher binding affinity to MSI1 RBD1 and RBD2, respectively. The obtained structural information on MSI1 RBD1 and RBD2 will be useful for a detailed functional and mechanistic understanding of this type of RNA–protein interactions.


Scientific Reports
| (2022) 12:12137 | https://doi.org/10.1038/s41598-022-16252-w www.nature.com/scientificreports/ determined as (G/A)U n AGU (n = 1-3) by an in vitro selection approach (SELEX) 4 . NMR titration experiments with a series of RNA oligomers revealed that MSI1 RBD1 and RBD2 bind to GUAG and UAG motifs with high affinity 13,14 . These data are in line with cross-linking and immunoprecipitation (iCLIP) studies, which revealed the trinucleotide sequence UAG in a single-stranded structural context, predominantly in the 3'UTRs of mRNAs, as a core Musashi binding element (MBE) 15,16 . Likewise, quantitative fluorescence anisotropy assays confirmed the binding specificity of the UAG trinucleotide, while nucleotides outside this core MBE have limited contribution to the overall binding free energy 17 . A different aspect of MSI pathobiology has been recently elucidated, i.e., the role of MSI proteins as host factors in viral infections, specifically their capacity to promote Zika virus (ZIKV) replication 18 . ZIKV is a mosquitoborne Flavivirus (MBFV) that has been circulating for decades in Africa and Asia, often being misdiagnosed as dengue. During a ZIKV outbreak in the Americas in 2015-2017, an unexpectedly high number of congenital malformations coupled with intrauterine growth restrictions, placental damage, and microcephaly has been associated with ZIKV infections 19 . While MBFVs are typically horizontally transmitted between arthropod vectors and vertebrate hosts, the capacity for transplacental passage aligns ZIKV with a handful of other MBFVs, including West Nile virus (WNV) and Powassan virus (POWV), that have been shown to cause placental infection and fetal neuropathology 20 . The presence of UAG-containing MBEs in the 3'UTRs of Flavivirus genomes, together with in vivo data revealing that MSI not only interacts with ZIKV RNA but also enhances viral replication, has led to the understanding that MSI is involved in ZIKV-induced neurotropism 18 . It has been hypothesized that MSI might stabilize viral RNA, thereby maintaining a sufficient RNA level that is not translated but subjected to purposeful exoribonuclease degradation 21 . The latter results in the production of short flavivirus RNA (sfRNA), which modulates cellular mRNA decay 22 and antiviral interferon response 23,24 . While these findings highlight the instrumental role of MSI in virus-associated cytopathicity, the biochemical foundations and mechanisms of the MSI-mediated congenital neuropathology remain elusive.
Computational prediction of the structural accessibility of RNA binding motifs is a promising approach for the characterization of RNA-protein binding sites. This idea has been applied to several eukaryotic RBPs, resulting in the observation that target site accessibility almost always increases the ability to predict sequence-specific RBP-RNA binding 25 . We have recently addressed the question as to whether other Flaviviruses have a similar MSI-mediated neurotropic potential to ZIKV by analyzing the affinity of Musashi binding elements (MBEs) in 3'UTR regions to appear in a single-stranded structural context, which is a requirement for efficient MSI-RNA interaction 21 . To this end, we have shown that the structural accessibility of MBEs along viral RNA molecules can be used as a proxy for predicting MSI-RNA interactions, thereby assessing the neurotropic potential of viruses. By employing a thermodynamic model of RNA folding based on the ViennaRNA package 26 , we computed the average opening energy that is necessary to keep specific MBEs in an unpaired structural context, rendering them accessible for MSI RRM-RNA interaction. Our data highlighted that MBEs in the 3' untranslated region (3'UTR) of ZIKV are highly accessible for MSI binding, thereby corroborating earlier studies that addressed the neurotropic potential of flaviviruses and alphaviruses 20 .
Here we follow up on this idea and model the 3D structure of MSI RBDs with Alphafold2. Subsequently, we investigate MSI-RNA association complexes, employing molecular dynamics (MD) approaches to gain more insight into the molecular traits of this type of RNP binding. Specifically, we focus on the published MSI1 RBD1-RNA complex and MSI1 RBD2-RNA complex (PDB IDs 2RS2 13 and 5X3Z 14 ), as shown in Fig. 1A and B, which were derived from NMR spectroscopy. Superimposition of MSI1 RBD1 and MSI1 RBD2 NMR structures and their sequence alignment is shown in Supplementary, Figures S1A and S1B. The RMSD between RBD1 and RBD2 structure is 0.997 Å. The RNA component of the complex comprises a canonical MBE with the pentamer sequence GUAGU. We set out to mutate individual RNA nucleotides to study the energetics of MSI1 binding  www.nature.com/scientificreports/ to alternative RNA motifs. To this end, we selected three additional pentamers, i.e., GUUGU, GGAGU, and GAUGU, whose central trinucleotides exhibited high, medium, and low affinities, respectively, within the thermodynamic ensemble of ZIKV 3'UTRs 21 .

Materials and methods
Protein structure prediction with Alphafold2. AlphaFold2 27 is an artificial intelligence (AI) approach for highly accurate protein structure prediction. In combination with MMseqs2 28 , a program for protein sequence search within large databases and generation of high quality protein sequence alignments, Alphafold2 is capable of simulating high accuracy structures for a wide range of proteins, for which structural data are unavailable. Here, we performed predictions for MSI1 RBD1 and MSI1 RBD2 using ColabFold 29 , which couples MMseqs2 and AlphaFold2 in publicly available notebooks that can be executed on the Google Cloud infrastructure. We were specifically interested in determining the protein structures in the apo form and comparing these to structures available through PDB. The sequences of MSI1 RBD1 and MSI1 RBD2 were retrieved from PDB IDs 2RS2 and 5X3Z. The first candidate structure (model 1) of both RBDs from ColabFold was selected as the initial conformation to assess GUAGU binding to both RBDs by MD simulations.

Molecular dynamics simulations.
The top five NMR structures of MSI1 RBD1/2:GUAGU complexes were retrieved from PDB IDs 2RS2 and 5X3Z. The LEaP module of AMBER16 30 was used to construct complexes with three alternative RNA pentamers (GUUGU, GGAGU, and GAUGU) by modifying the central trinucleotides. The protonation states of RNA-protein complexes were computed using the PDB2PQR server at pH 7.4. The AMBER ff14SB and chiOL3 (OL3) force fields 30 were employed for protein and RNA, respectively. According to standard procedures, the missing hydrogen atoms of each system were added by the LeaP module. The added hydrogen atoms were then minimized for 1000 steps by steepest descents (SD) and subsequently by 3000 steps of conjugated gradient (CG). Subsequently, solvation of each system was performed by TIP3P water molecules 31 of approximately 6800 atoms for RBD1 and 7300 atoms for RBD2 in a periodic box at a distance of 12 Å apart from the protein surface, resulting in a box dimension of 63 × 70 × 62 Å 3 , and 70 × 66 × 63 Å 3 , respectively. The systems were neutralized using Na + counter ions. Periodic boundary condition with isothermal-isobaric ensemble (NPT) ensemble and a step-size of 2 fs for the simulation time were applied. The water molecules and ions were then minimized with 1000 steps of the steepest descent (SD) and continued with 3000 steps of the conjugate gradient (CG) method. The entire system was fully minimized in the last step by the same minimization process. All bonds with hydrogen atoms were constrained using the SHAKE algorithm 32 . MD simulations under periodic boundary conditions were performed five times for all systems using the AMBER16 software package 30 . The MD simulation started by heating the system from 10 to 310 K. Next, the system was equilibrated at a constant temperature of 310 K. 100 ns MD simulation was performed under NPT conditions at 1 atm and 310 K. The last 20 ns MD trajectories were taken for structural and energetics analyses. Root-mean-square displacement (RMSD) and distance between the centers of mass of protein and RNA were calculated by the cpptraj module of AmberTools16 33 . The interactions between protein and RNA were visualized and analyzed using Discovery Studio Visualizer 34 . Additionally, the solvated interaction energy (SIE) 35 method was applied to estimate the binding affinities of MSI1 RBD1/2 RNA complexes, and to determine the binding contribution of each nucleotide. SIE is an end-point physics-based scoring function that approximates the binding free energy from the force-field non-bonded interaction terms, continuum solvation, and configurational entropy linear compensation 35 . For each individual simulation, the SIE binding free energy of the complex was calculated over 200 snapshots from the last 20 ns (1000 snapshots in total) using the equation: The binding affinity prediction was estimated by summation of Coulomb interactions (∆E c ) and van der Waals interactions (∆E vdW ), the electrostatic solvation contribution (∆G R ), reaction field energy, and nonpolar desolvation energy. ∆E c and van der Waals interactions of the bound state were calculated with AMBER ff14SB and OL3 molecular mechanics force fields. The electrostatic solvation contribution was carried out using the continuum dielectric model with a solute interior dielectric constant and a solvent dielectric constant. The reaction field energies were considered by the Poisson equation with the boundary element method program. The nonpolar desolvation was estimated by a linear proportionality with the change in the solute molecular surface area 35 . Note that the global proportionality coefficient associated with the loss of conformational entropy upon binding (α) is 0.104758, while the solute interior dielectric constant (D in ) is 2.25. The molecular surface area coefficient (γ) is 0.012894 kcal/mol −1 Å −2 , ΔMSA(ρ) is the difference in molecular surface area between the bound and free state of the protein and constant (C) is − 2.89 kcal/mol −1 . These parameters were optimized by fitting to the absolute binding free energy 36,37 The binding affinity values of the canonical RNA motif (GUAGU) and three modified RNA motifs (GUUGU, GGAGU, and GAUGU) with MSI1 RBD1 and MSI1 RBD2 from the SIE method were taken from the 200 snapshots of the last 20 ns of the five models of each system (1000 snapshots in total). For the amino acids involved in each nucleotide binding of the four RNAs, ΔG bind,res calculations based on the MM/ PBSA method were performed on the same series of 1000 snapshots.

Results
Structure prediction of MSI1 RBD1/2:GUAGU . For the five predicted structures of MSI1 RBD1/2 from Alphafold2, the number of sequences per position and the per-residue confidence metric (pLDDT) are used to determine the validity of the Alphafold2 results ( Figure S2). For MSI1 RBD1, the core structure is covered by  Figure S2A). Likewise, the model confidence at each position increases up to 90% and drops to 70% and 40%, respectively, at the flexible loops and C-terminal. Interestingly, all predicted structures of the five models of MSI1 RBD1 exhibit a similar structure, except in the C terminal region. A similar situation is found for the MSI1 RBD2 models, except that the model confidence at the two terminals is lower to some extent ( Figure S2B). The core structure of the predicted models MSI1 RBD1/2 is comparable to the experimentally solved structures (PDB IDs 1UAW and 5X3Y), in particular, RBD2 with the RMSD value 1.777 and 0.837 Å, respectively (Fig. 2). For further investigations, MD simulations of the protein-RNA association complex were performed. To this end, 100 ns MD simulations were applied on the complex between model 1 of the Alphafold2 simulations, and the canonical RNA pentamer (GUAGU), which has been extracted from the corresponding NMR structure. The root-mean-square displacement (RMSD) during the simulation was evaluated from the geometric coordinates of all atoms of the complex, as well as from the RBD site with respect to those of the initial structures. As shown in Figure S3A, the RMSD values of the predicted MSI1-RBD1:GUAGU increase up to ~ 5.0 Å during the first 20 ns, then decrease to ~ 3.1 Å with a fluctuation of approximately 0.5 Å until the end of the simulation. For MSI1-RBD2 ( Figure S3B), RMSD increase is found within the first 20 ns and maintained at around 6.0 Å with a fluctuation at 1.0 Å up to 100 ns. The RBD site exhibits a much lower RMSD of ~ 1.0-1.7 and ~ 2.0-2.3 Å in both systems, respectively. This implies high fluctuation at the protein terminals, especially at the C terminal end, as well as flexible loops, and the 3' end of the GUAGU pentamer ( Figure S3).
To estimate the canonical RNA binding affinity, the SIE method was employed on 200 snapshots taken from the last 20 ns. The ΔG bind results of MSI1-RBD1 (− 16.77 ± 0.66 kcal/mol) and MSI1-RBD2 (− 16.54 ± 0.99 kcal/ mol) are comparable, and the Coulomb interaction plays a significant role in RNA binding, approximately 2-3 times higher than the vdW interaction ( Table 1). The energy contributions of the residues for RNA recognition ( Figure S4) show that the 5'-G of GUAGU RBD1 interacts with Trp29 in (black), while RBD2 connects with Asp143. Likewise, U2 interacts with Phe23, Gly26, Phe63, and Lys93 in RBD1, while in RBD2, stabilization is detected by Phe112 and Gly115. The remaining nucleotides of the core MBE, i.e. A3, and G4, are stabilized by

Molecular dynamics study of MSI1-RBD1/2 with alternative RNA motifs. In addition to study-
ing the MSI1-RBD1/2 in complex with GUAGU, which has been obtained from NMR structures, we set out to explore three alternative RNA pentamers, i.e., GUUGU, GGAGU, and GAUGU, by MD simulations. The overall tightness of the MSI1-RBD1 and MSI1-RBD2 in complex with these four RNA pentamers bound was assessed by radius of gyration (R g ) in Figure S5. To this end, nucleotides of the pentamer triplet cores were adjusted using the NMR structure to obtain starting geometries for MD. The last snapshots from all simulations were superimposed and are depicted in Figure S6, while the root mean square fluctuation (RMSF) of protein residues is shown in Figure S7. MSI1-RBD1/2 in complex with the canonical RNA GUAGU show the highest stability among all complexes, i.e., the pentanucleotide is well accommodated within the RBD site. The most considerable difference in pentanucleotide conformation is found in the GAUGU system. As shown in Fig. 3, the central trinucleotides of all models are placed significantly closer to the protein center (distance distribution of ∼10-12 Å) than the flanking nucleotides (∼14-18 Å). The structural fluctuation of the C-terminal ( Figures S6 and S7) is related to the high mobility of the RNA 3' end, as seen by large interquartile ranges in Fig. 3. A change from A to U at position 3 (GUUGU) moves the C-terminal closer to the 3' end in RBD1, leading to better stabilization. For GGAGU, the substitution from U to G at the second position results in increased distances of this nucleotide in both RBDs as well as at the 5' end in RBD1 and the two ends in RBD2. Interestingly, changing two nucleotides of the trinucleotide core, leading to GAUGU, results in significantly lengthened distances. Remarkably, the range of the distance distributions is substantially wider in the case of GAUGU compared to the original GUAGU pentamer. By considering the distance plot, the structural fluctuation of RNAs within the RBD1/2 site is ranked in the order of GUUGU<GUAGU<GGAGU<<GAUGU in RBD1; and GUAGU<GGAGU<GUUGU<<GAUGU in RBD2. In other words, RNA motifs with less structural fluctuation show a higher affinity for MSI1-RBD1/2.
The SIE method was applied for ΔG bind calculations to predict the pentanucleotide binding strength to MSI1-RBD1/2. From Fig. 4 (Table 1), while the decreased Coulomb interaction (~ 2-fold) in MSI1-RBD2 is compensated by the reduction in the change of the reaction energy upon binding (2-fold). Although the resulting ΔG bind follows the same trend as the structural data above, RNA-protein interactions must be taken into consideration for RNA recognition by a specific protein. From this perspective, the binding of each nucleotide was evaluated by using the SIE binding free energy and MM/PBSA per-residue decomposition free energy calculations.
The highest binding affinity of GUUGU to RBD1 in Fig. 4A can be explained by a strong binding of the central trinucleotide UUG of − 5.22 ± 0.18, − 6.09 ± 0.35, and − 6.56 ± 0.42 kcal/mol (Fig. 4B and Table S2). The trinucleotide binding is slightly weaker in GUAGU. The binding free energies of the remaining pentanucleotides GGAGU and GAUGU are significantly weaker, as can also be seen from the individual nucleotide contributions. In the case of RBD2, the GUAGU pentamer has the lowest binding free energy, whose trinucleotide binding free  (Fig. 4A), the other pentanucleotides exhibit a substantially weaker binding. Figure 5 shows the residue contributions for each nucleotide binding MSI1-RBD1 and RBD2. Negative and positive ΔG bind,res values represent the nucleotide stabilization and destabilization, respectively. Most RNAinteracting residues are found on the beta-sheet face; However, certain residues in the flexible loop regions also interact with RNA. For RBD1 (Fig. 5A), the G at the 5' end interacts with Trp29 in all models (black). The U at position 2 of the GUAGU and GUUGU pentamers has interactions with Phe23, Gly26, Phe63, and Lys93, while the G2 of GGAGU binds with Gly26, Asp91, and Arg99. The situation is different for GAUGU. Although the A is stabilized by Gly26 and Phe63, it is destabilized by Asp91, which is in agreement with our binding free energy data (Fig. 4B). The central nucleotide (position 3) of GUAGU, GUUGU and GGAGU interacts with Ala95 and the three phenylalanines Phe23, Phe63, and Phe96 13 . The energy contribution of Ala95 is reduced for the central nucleotide of GAUGU. Among all RNAs, the positively charged residues Arg98 and Arg99 provide the highest stabilization to G4 of GUUGU, relating to its highest binding affinity (Fig. 4B). Additionally, Lys21, Met52, and Phe65 are also important for the binding of this nucleotide. Their contributions are lowered in the GAUGU model. At the 3' end, we observe stabilization from positively charged residues at the C-terminal: Arg98 and Arg99 in GUAGU; Arg61 and Arg98 in GUUGU; and Arg61, Arg98 and Arg99 in GAUGU. These contributions are substantially lower in GGAGU.
For RBD2 the 5' G of the GUAGU pentanucleotide interacts with Val118, which is located at a structurally similar position as Trp29 in RBD1. The second nucleotide of all pentamers has a weak interaction with Gly115 and Phe152, while the third nucleotide of all pentamers interacts with Phe112 and Phe152, which correspond to residues as Phe23 and Phe63 in RBD1. The fourth nucleotide of all pentamers interacts with Lys110, Phe154, Gln185, and Lys187. For GAUGU, we observed a repulsive interaction between the 5'-terminal G and Lys177, and A2 and Glu180, thus highlighting the poor interaction of GAUGU with MSI1-RBD2 (Fig. 5B). Furthermore, at protein-RNA interfaces, stacks can be intermolecular, formed by rings of the nucleic acid bases with the aromatic side-chains of phenylalanine, arginine, alanine, lysine, glycine, and methionine. However, stacking interactions do not appear to provide significant sequence specificity in protein-RNA complexes ( Figure S8).

Discussion
Musashi genes have attracted considerable interest as regulators of stem and progenitor cell characteristics. In the present study, we evaluated the three-dimensional structures of the MSI1 in complex with RNA. To this end, we studied the association of MSI1 RNA-binding domains 1 and 2 (RBD1 and RBD2) with different RNA motifs. We investigated the canonical RNA motif GUAGU, as well as the alternative motifs GUUGU (good binding affinity), GGAGU (weaker binding affinity), and GAUGU (unfavorable binding affinity). We compared the protein structures without RNA from the PDB with Alphafold2-predicted geometries and found that protein structures of the binding domains are highly similar, therefore no conformational changes on the protein occur upon binding of the RNA. In addition, our results corroborate earlier findings that MSI1 RBD1 and RBD2 structures are remarkably similar, despite variation in the underlying primary sequence 21 . To investigate the properties of the RNA-protein association complexes, we performed molecular dynamics simulations and computed the interaction energies by the SIE method.
In agreement with earlier results 17 , the central trinucleotides of the RNA pentamers (Musashi binding element, MBE) are more rigid than the flanking nucleotides. Moreover, the flanking nucleotides lack interaction with MSI1 RBDs 13 , suggesting that MSI1-RBD1 and RBD2 require the central trinucleotides for recognition. Our MD simulations show that the central trinucleotides of the RNA motifs exhibit a significantly lower distance to the MSI1 RBDs than the enclosing nucleotides. Thus, the central trinucleotides play an important role in the interaction of MSI1-RBD1 and RBD2 with RNA.
The SIE calculations lead to the following conclusions: Assessment of the contributions to the overall binding free energy of individual nucleotides of the GUAGU and GGAGU motifs shows that the central core nucleotides have the largest interaction energies, with A3 and G4 nucleotides exhibiting the most pronounced contribution. The flanking nucleotides contribute significantly less. Our calculations show that for RBD1, the GUUGU motif possesses the largest binding free energy, followed by GUAGU. While this appears counterintuitive, it is in line with earlier data that assessed opening energy z scores at the level of RNA secondary structures. RBD2 on the other side has overall smaller interaction energy, with the GUAGU motif showing the highest binding affinity for all pentamers.
Calculated decomposition energies clearly show the contributions of individual amino acids to the complexation of the RNA. For RBD1, we should highlight Phe23, Phe63 and Phe65 because of their substantial interaction with the core motif. In analogy, Phe112, Phe152 and Phe154 of RBD2 show a strong interaction with the core trinucleotides.
In summary, we show here the feasibility of MD and SIE calculations to investigate the selectivity of RNA-protein interaction complexation. Further studies are warranted, such as the binding of a longer RNA chain that includes both binding motifs of the two RBDs of Musashi proteins. MSI1 plays a particularly important role in brain development, and increased expression of Musashi proteins in patients infected with Zika virus during pregnancy has been associated with microcephaly. A better understanding of the interaction of the MSI proteins with RNA, in particular, Zika virus RNA is required to address the issue of inhibiting Zika virus replication in infected patients without affecting brain development.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.